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Abstract 

We present randomized algorithms to compute the sumset (Minkowski 
sum) of fwo integer sets, and to multiply two univariate integer polyno¬ 
mials given by sparse representations. Our algorithm for sumset has cost 
softly linear in the combined size of the inputs and output. This is used 
as part of our sparse multiplication algorithm, whose cost is softly linear 
in the combined size of fhe inputs, output, and the sumset of the supports 
of the inputs. As a subroutine, we present a new method for computing 
the coefficients of a sparse polynomial, given a set containing its support. 
Our multiplication algorithm extends to multivariate Laurent polynomials 
over finite fields and rational numbers. Our techniques are based on sparse 
interpolation algorithms and results from analytic number theory. 


1 Introduction 

Sparse pol 5 momials are a fundamental object in computer algebra. Computer 
algebra programs including Maple, Mathematica, Sage, and Singular use a 
sparse representation by default for multivariate polynomials, and there has 
been considerable recent work on how to efficiently store and compute with 
sparse pol 5 momials [Fateman, 2003, Gastineau and Laskar, 2013, Monagan and Pearce, 
2009, Roche, 2011, van der Hoeven and Lecerf, 2012]. 

However, despite the memory advantage of sparse pol 5 momials, the alter¬ 
native dense representation is still widely used for an obvious reason: speed. 

It is now classical [Cantor and Kaltofen, 1991] that two degree-D dense poly¬ 
nomials can be multiplied in softly linear time: O(DlogDloglogD) ring op¬ 
erations, and even better in many cases [Harvey et al., 2014b]. By contrast, two 
size-T sparse pol 5 momials require 0{T^) operations, and this excludes the po¬ 
tentially significant cost of exponent arithmetic. 


1 


Much of the recent work on sparse arithmetic has focused on "somewhat 
dense" or structured cases, where the sparsity of the product is sub-quadratic 
[Monagan and Pearce, 2009, Roche, 2011, van der Hoeven and Lecerf, 2012], At 
the same time, sparse interpolation algorithms, which in the fastest case can 
learn an unknown T-sparse pol 5 momial from O(T') evaluations, have gained 
renewed interest [Cuyt and Lee, 2008, Javadi and Monagan, 2010, Comer et al., 

2012, Arnold et al.]. 

Most closely related to the current work, [van der Hoeven and Lecerf, 2013] 
recently presented algorithms to discover the coefficients of a sparse pol 5 mo- 
mial product, provided a list of the exponents and some preprocessing. In the 
context of pattern matching problems, [Cole and Hariharan, 2002] gave a Las 
Vegas algorithm to multiply sparse pol 5 momials with normegative integer co¬ 
efficients whose cost is 0(Tlog^ D). 

A remaining question is whether output-sensitive sparse multiplication is 
possible in time comparable to that of dense multiplication. This paper an¬ 
swers that question, with three provisos: First, our complexity is proportional 
to the "structural sparsity" of the output that accounts for exponent collisions 
but not coefficient cancellations; second, our algorithms are randomized and 
may produce incorrect results with controllably low probability; and third, we 
ignore logarithmic factors in the size of the input. 

To explain the first proviso, define for a pol 5 momial F its support supp(F) to 
be the set of exponents of nonzero terms in F. The sparsity of F, written #F, is 
exactly #supp(F). For two polyomials F and G, we have #supp(FG) < #F ■ #G. 

But in many cases the set of possible exponents 

poss(F, G) = {ep + ec : ep e supp(F),eG e supp(G)} 

is much smaller than #F ■ #G. This structural sparsity T = # poss(F, G), is an 
upper bound on the actual sparsity S = #supp(FG) of the product. Strict in¬ 
equality S < T occurs only in the presence of coefficient cancellations. Part of our 
algorithm's cost depends only on the actual sparsity, and part depends on the 
potentially-larger structural sparsity. 

Our algorithms have not yet been carefully implemented, and we do not 
claim that they would be faster than the excellent software of [Gastineau and Laskar, 

2013, Monagan and Pearce, 2013] and others for a wide range of practical prob¬ 
lems. However, this complexity improvement indicates that the barriers be¬ 
tween sparse and dense arithmetic may be weaker than we once thought, and 
we hope our work will lead to practical improvements in the near future. 

1.1 Our contributions 

Our main algorithm is summarized in Theorem 1.1. Here and throughout, we 
rely on a version of "soft-oh" notation that also accounts for a bound y on 

the probability of failure: 0^{(p) = C9(<^ • po\y\og{(p/y)), for any function (p, 
where polylog means log”^ for some fixed c > 0 [von zur Gathen and Gerhard, 

2013, see sec. 25.7]. 
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Theorem 1.1. Given F,G G Z[xj with degree bound D > deg F + deg G and height 
bound C > ||F||j^ + ||G||^, and }i G (0,1), Algorithm SparseMultZZ correctly 
computes the product H = FG with probability exceeding 1 — p, using worst-case 
expected 0^(SlogC + TlogD) bit operations, where S = #supp(FG) and T = 
# poss(F, G) are the actual and structural sparsity of the product, respectively. 

The linear dependence on log D and log C means this result extends easily 
to multivariate pol 5 momials and finite field coefficienfs, using fhe Kronecker 
subsfifufion. 

Our algorifhm relies on two subroufines, bofh of which are based on fech- 
niques from sparse inferpolafion and rely on number-fheorefic resulfs on fhe 
availabilify of primes. 

The firsf subroufine Sumsef (.4, B) compufes fhe sumset of fwo sefs of infe- 
gers A and B, defined as 

A®B={a + b-.aeA,beB}. 

This algorifhm, which may be of independenf inferesf, has soffly-linear com¬ 
plexify in fhe size of fhe oufpuf A® B. 

The second subroufine SparseMulCoeffs(F, G, S) requires a sef confaining 
supp(FG) in order fo compufe FG in fime soffly-linear in fhe inpuf and oufpuf 
sizes. If is based on an algorifhm in [van der Hoeven and Lecerf, 2013], buf is 
more efficienf for large exponenfs. 

The main sfeps of our mulfiplicafion algorifhm are: 

1. Use Sumsef fo compufe poss(F, G). 

2. Run SparseMulCoeffs wifh S = poss(F, G) buf wifh smaller coefficienfs, 
fo discover fhe frue supp(FG). 

3. Run SparseMulCoeffs again, wifh fhe smaller exponenf sef supp(FG) buf 
wifh fhe full coefficienfs. 

Sfeps 1 and 2 work wifh a size-T exponenf sef buf wifh small coefficienfs, 
and bofh confribufe Oji{Tlog D) fo fhe overall bif complexify. Sfep 3 uses fhe 
size-S frue supporf buf wifh fhe full coefficienfs, and requires 0^(S(logD -|- 
log C)) bif operafions, for a fofal of (T log D -|- S log C). 

1.2 Organization of the paper 

Section 2 sfafes our nofafional conventions and some sfandard resulfs, and Sec- 
fion 3 confains fhe fechnical number fheorefic resulfs on which we base our 
randomizations. 

Section 4 revisifs and adapfs our sparse inferpolafion algorifhm from ISSAC 
2014 fhaf will be a subroufine for our sumsef algorifhm, presenfed in secfion 5. 

Our new mefhod fo find fhe coefficienfs, once fhe supporf is known, is pre¬ 
senfed in Secfion 6. This is fhen used in concerf wifh our sumsef algorifhm in 
Secfion 7 fo describe fully fhe algorifhm of Theorem 1.1, and also fo explain 
how fhis can be easily exfended fo oufpuf-sensifive sparse mulfiplicafion over 
.. .,x,f^], where R is Zm, Q, or GF(p'^). 
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2 Background and Preliminaries 

We count the cost of our algorithms in terms of bif complexify on a random- 
access machine. We sfafe fheir cosfs using nofafion, meaning fhaf our algo- 
rifhms have a factor log"^ ^ in fhe rurming fime. We can make c = 1 by rurming 
fhe enfire algorithm with error bound | some 0 (log i) times, then returning 
the most frequent result. 

Our main algorithm depends on an unknown number-theoretic constant, 
as discussed in Section 3. Thus we have proven only the existence of a Monfe 
Carlo algorifhm. We also discuss how fhis could be easily handled in pracfice. 

Our randomized procedures refurn eifher fhe correcf answer (wifh confrol- 
lable probabilify 1 — pi), or an incorrecf answer, or fhe symbol Fail. Whenever 
a subroufine refurns Fail, we assume fhe calling procedure refurns Fail as 
well. 

2.1 Notation and Representations 

We lef R denote a commutative ring with identity. For F G R[x] we let (F) c 
R[x] denote the ideal generated by F. For n, m G Z, m > 0, we let n rem m 
and n rem m denote the integers s G [0, m) and t G [—m/2, m/2) respectively, 
such that n = s = t (mod m). We write for Z/mZ, t 5 rpically represented 
as {n rem m\n G Z}. 

Unless otherwise stated we assume F G R [x] is of the form 

F(x) = Ei<i<sCiX‘^‘^ ( 1 ) 

with coefficients c, G R and exponents e; G Z>o. Often we assume each c; 7 ^ 0 
and the exponents are sorted ei < ■ ■ ■ < es, but it is sometimes useful to relax 
these conditions. 

We write S > #F and D > deg F for the sparsity and degree bounds. When 

def 

R = Z we write C > ||F||^ = max; \ci\ for the height of F. We also use the 
norm ||F||j = Ei \ci\- 

More generally, we consider multivariate Laurent pol 5 momials 

s 

F CjXj'i ■ ■ ■ x(^'" G R[x*\..., xj^]. 

z=l 

In the case R = Z, the sparse representation of F consists of a tuple (n, C,D,S), 
followed by a list of S tuples (c,, {en ,..., ei„)), where each c; is stored using 
0(logC) bits and each e^ is stored using 0(logD) bits. When R is instead a 
finite ring, C is omitted and each c, is stored using 0(log |R|) bits. 

When multiplying F, G G Z[x], we assume shared bounds C and D so that 
total input/output size is 

(5((#F + #G + #(FG))(logC + logD)), (2) 
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given that ||FG||j < C^min(#F,#G). 

The dense representation of an n-variate pol 5 momial F is an ^-dimensional 
array of D” coefficienfs, where exponenfs are implicifly stored as array indices. 
In fhe case thaf R = Z with bound C as above, this requires 0(D” log C) bits. 

The terms "sparse pol 5 momial" and "dense pol 5 momial" refer only to the 
choice of represenfafion, and nof to the relative number of nonzero coefficients. 
Typically, we assume F is sparse and reserve F to indicate a dense pol 5 momial. 
Converting between the sparse and dense representations has softly linear cost 
in the combined input/output size. 

When computing a sumset A 0 B, we assume that every integer in A or B is 
represented using ©(log D) bits, where D > max(||A||,^, ||B||,^). In this setting 
the combined bit-size of A, B, and A © B is 0{{#A + #B + #(A © B)) log D). 

2.2 Integer and Polynomial Arithmetic 

We cife fhe following results from infeger and pol 5 momial arithmetic, which 
we use throughout. 

Fact 2.1 ([Harvey et al., 2014a], Thm 9.8; [von zur Gathen and Gerhard, 2013], 
Cor. 9.9). Let m,n G Z. Then the following may be computed using 0{\ogm + 
log n) operations: mzLn, mn, m rem n, m fern n. Also, arithmetic operations in Z„ 
require C9(logM) hit operations. 

Fact 2.2 ([von zur Gathen and Gerhard, 2013], Thm. 10.25). Given m^ G Z>o, 
and Vi rem m,- for 1 < i < t and M = n[=i 'w,-, one can compute the solution 
V G [0, M) to the set of congruences v = Vi (mod m,), 1 < z < f using O(logM) 
bit operations. 

We use Chinese Remaindering to construct exponents from sets of congru¬ 
ences. We also use dense pol 5 momial arifhmefic as a subroutine, and cite the 
following resulfs. 

Fact 2.3 ([Harvey et al., 2014b]). Let F, G G Zm[^], deg F, deg G < D. Then the 
following may he computed using 0{Dlog m) hit operations: F ± G, FG, F rem G. 

In particular, dense arithmetic operations in Z^ [x] / (xP — 1) may be com¬ 
puted in ©(p log m) bit operations. 

Assume F G Z[x] as in (1) with bounds D, S, and C as described above. 
Our algorithm performs arithmetic on modular images of F. For F G Zm[x], 
we represenf F(x) mod (xP — 1) G Zm[x]/(xP — 1) by the remainder from di¬ 
viding Zm(x) by (xP — 1). Namely F(x) mod (xP — 1) is a pol 5 momial wifh 
degree less than p. Note we treat F(x) rem (xP — 1) and F(x) mod (xP — 1) as 
elements of R[x] and R[x] / (xP — 1) respectively. To reduce a sparse pol 5 momial 
F mod (xP — 1), we reduce each exponent modulo p, and then add like-degree 
terms. By Fact 2.1, we have: 

Corollary 2.4. Given any F G Z,„[x], we can compute F mod (xP — 1) using 
0(S(log D + logm)) bit operations. 
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Procedure GetPrime(A, p) 

Input: A > 21; ^ e (0/1)- 

Output: Integer p G (A,2A], s.t. Pr[p not prime] < p. 

1 repeat OT = [(5/6) In Aln(l/p)] times 

2 p •<— random odd integer from (A, 2A] n Z 

3 if p is prime then return p 

4 return Fail 


Procedure GetVanishPrime(S, D, 7 , p) 

Input: Integers S, D e Z>o; 7 G (0,1]; p G (0,1). 

Output: Integer p, s.t. for any sef S safisfying #S < S and ||S||^ < D, 
wifh probabilify af leasf 1 — p, p is a 7 -vanish-prime for S. 

1 A ■<— max ^21, ^ min ^S, In 

2 return GefPrime(A, p/2) 


3 Number-theoretic subroutines 

3.1 Choosing primes 

We first recall how to choose a random prime number. 

Fact 3.1 (GoroIIary 3, [Rosser and Schoenfeld, 1962]). i/A > 21, then the number 
of primes in (A, 2A] is at least 3A/ (5 In A). 

We test if p is primein (!l(polylog(p)) time viafhe mefhod in [Agrawal ef al., 
2004]. This fesf and Facf 3.1 lead fo procedure GefPrime. 

Lemma 3.2. GetPrime(\, p) works as stated and has bit complexity (!l^(polylog(A)). 

Proof. The sfafed cosf follows from fasf primalify fesfing due fo [Agrawal ef al., 
2004]. The probabilify fhaf any chosen p is prime is af leasf 6 /(5 In A), from 
Facf 3.1. Therefore, using fhe facf fhaf (1 — x) < exp(—x) for any nonzero x G 
]R, fhe probabilify fhaf none of fhe chosen p are prime is af mosf (l — < 

exp {f^) < p, as desired. □ 

If is frequenfly useful fo choose a random prime fhaf divides very few of 
fhe infegers in some unknown sef S C Z. If a fraction of 7 infegers in S do nof 
vanish modulo p, fhen we call p a ^-vanish-prime for S. We call a 1-vanish-prime 
for S a good vanish-prime. Procedure GefVanishPrime shows how fo choose a 
random 7 -vanish-prime. 

Lemma 3.3. Procedure GetVanishPrime works as stated to produce a ^-vanish-prime 
p satisfying 

pGO(i min logo) 


6 











Procedure GetDiffPrime(S, D, 7, pi) 

Input: Integers S, D e Z>o; 7 G (0/1]; F G (0/1)- 

Output: Integer p, s.t. for any set S satisfying #S < S and diam(S) < D, 
wifh probabilify at least 1 — fi, p is a 7 -difference-prime for S. 

^ ^ ^ “ 1) T^) 

2 if A < 21 then return GetPrune(21, p/2) 

3 else if A > D then return GetPrime(D, p/2) 

4 else return GetPrime(A, p/2) 


and has bit complexity polylog(p). 

Proof. Let S be any subset of infegers with #S < S and ||S||oa < D. Write 
M = n« 6 S I'^l < write k for fhe number of "bad primes" for which 

more fhan (1 — 7)5 elemenfs of S vanish modulo p. Since each p > A, fhis 
means fhaf < M, and because M < D®, k < In D/((l — 7 ) In A) is an 

upper bound on fhe number of bad primes. 

If 1 — 7 is very small, we insfead use a similar argumenf fo say fhaf fhe num¬ 
ber of primes for which any elemenf of S vanishes is af mosf k < S In D / In A. 

Then Facf 3.1 guaranfees fhe prevalence of bad primes among all primes 
in (A,2A) is af mosf p/2, so fhe probabilify of geffing a bad prime, or of erro¬ 
neously refuming a composife p, is bounded by p. □ 

3.2 Avoiding collisions 

A closely relafed problem is fo choose p so fhaf mosf infegers in a sef S are 
unique modulo p. We say fhaf a eS collides modulo p if fhere exisfs fo G S wifh 
a = b (mod p). We say p is a 'y-difference-prime for S if fhe fracfion of infegers 
in S which do nof collide modulo p is af leasf 7 . A 1-difference-prime is called 
a good difference-prime for S. Procedure GefDiffPrime shows how fo compufe 
difference-primes, condifioned on fhe diameter of fhe unknown sef S: 

diam(S) max(S) — min(S). 

Lemma 3.4. Procedure GetDiffPrime has bit complexity polylog(p) and works as 
stated to produce a y-difference-prime p satisfying p G 0{D) and 

peO (is min (S, logo) . 

Proof. Lef S be any sef as described. An elemenf a G S collides modulo p iff fhe 
producf of differences Y[bes,b^a{‘^ ~ vanishes modulo p. If p > D fhis can 
never happen. Ofherwise, as each such producf is af mosf diam(S)®“^ < DS- 1 , 
fhe resulf follows from fhe Lemma 3.3, seffing fhe D of fhe lemma fo D^~^. □ 
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Our algorithms often perform arifhmefic modulo (xf — 1). Similar fo fhe 
notion of collisions above for a sef of infegers modulo p, we say fwo disfincf 
ferms cx'^ and c'x‘^' of F E R[x] coZ/zde modulo (xP — 1) if e = e' (mod p). 

Example 3.5. Let F = x + x^ + 3x^. Then F mod (x^ — 1) = 2x + 3x^. The term 
2x is the image of x + x^. We say x and x^ collide modulo (x^ — 1), whereas 3x^ 
uniquely maps to 3x^. 

Essentially, reduction modulo (xP — 1) "hashes" exponent e E supp(F) to 
e fern p. If p is a good difference-prime for supp(F) and q is a good vanish- 
prime for fhe coefficienfs of F, fhen F rem (xP — 1) wifh coefficienfs reduced 
modulo q has fhe same sparsify as F ifself. 

3.3 Primes in arithmetic progressions 

Sometimes we implicifly reduce exponenfs modulo p by evaluating af pfh roots 
of unity. In such cases we need to construct primes q such that p\{q — 1), and 
to find pfh roots of unity modulo each q. 

In principle, this procedure is no different than the previous ones, as there 
is ample practical and theoretical evidence to suggest that the prevalence of 
primes in arithmetic progressions without common divisors is roughly the 
same as their prevalence over the integers in general. 

However, the closest to Fact 3.1 that we can get here is as follows, which is 
a special case of Lemma 7 in [Fouvry, 2013]. 

Lemma 3.6. There exists an absolute constant Aq such that, for all A > Aq, and for 
all but at most A/ In^ A primes p in the range (A, 2A], there are at least In A 
primes q in the range (A^-®^, 2A^-®^] such that p\(q — 1). 

Proof. Set K = 0.53, which means (1.89)“^ < K < ^. Fixing s = 1, and for 
any R > 2, Lemma 7 in [Fouvry, 2013] guarantees the existence of positive 
constants xjc and x^ such that the following holds: For all x > max(x]c, 
and for all but R/ln^ R integers r E (i^,2K], there are at least x^x/{(p{r) Inx) 
primes q in the range (x, 2x] such that r\{q — 1), where cp(r) is the Euler totient 
function. 

Setting Aq = max(xjc, 3.78/a]c)/ and letting R = A, r = p, and x = the 
statement of our lemma holds because 

cp{r) Inx = (p — 1) In A^ ®^ < 3.78Aln A, 

xkx A°-®^ 

^(r)lnx (p —l)lnA^®^ ^ InA 

Lemma 3.6 forms the basis for Algorithm GetPrimRoots, where we assume 
that the constant Ag is given. Since this constant has not actually been com¬ 
puted, a reasonable strategy would be to choose some small "guess" for Ag 
and run the algorithm until it does not report failure. If the algorithm fails, it 
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Procedure GetPrimRoots(D, T, C, }i) 


Input: D > degF; T > #F; C > ||F||^; ji £ (0,1); where F £ Z[x] is fixed 
but unspecified. 

Output: Prime p, primes {qi, .. q^), and integers (mj, ..., or Fail. 

1 m ^ [lg|] 

2 A ^ max (|786,Ao,^mT(r- l)lnD,l-351n3-^^(2C)) 


3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


a ^ [l.lln(2C)ln^A] 
repeat ni times 
p ^GetPrime(A, 

Al ■<— fl distinct even integers in [2,2A*^-®^] 

Q, W empty lists 
foreach a £ Al do 

q i — up -j- 1 

f ■<— random nonzero element of Zq 
if q is prime and mod q then 
Add to Q and a; = to W 
if HijeQ q >'2-C then return p, Q, and W 


14 return Fail 


could be due to the random prime p being an "exception" in Lemma 3.6, or due 
to unlucky guesses for the primitive roots or due to the guessed constant Aq 
being too small. Because our primality tests are deterministic, failure due to Ag 
being too small is detectable by the algorithm returning Fail. 

We state the running time and correctness, assuming Ag is sufficiently large, 
as follows. 

Lemma 3.7. Procedure GetPrimRoots has worst-case bit complexity 
Oft (log C • polylog (T + log D)). 

With probability at least 1 — y, it returns a good difference-prime p for F, primes 
qi,...,qitt such that Hi qt > 2C, and pth primitive roots modulo each qi, oj\, ..., cok- 

Proof. The lower bound A > max(786,1.35 In^'^^ (2C)) guarantees that there are 
sufficiently many even integers in the range [2,2A*^ ®^] in order for Step 6 to be 
valid, since for any A > 786, we have A'^ ®^ > A'^^ In^ A > 1.1 ln(2C) In^ A. 

For the running time, the outer loop does not affect the complexity in our 
notation because m £ 0(log jj) £ Oft(l). Observe also that 

log A £ polylog ^Ag + T + log D + log C + . 

The running time is dominated by the AKS primality tests in the inner loop, 
which are performed 0(mlogCpolylog(A)) times, each at cost 0(polylog(A)), 
giving the stated worst-case bit complexity. 
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All of the checks for primality of p and q/s, as well as the test that each w, 
is a pth primitive root of unity modulo qi, are deterministic. Therefore the only 
possibility that the algorithm returns an incorrect result other than Fail is the 
probability that p is not a good difference-prime for supp(F). According to the 
proof of Lemma 3.4, the condition A> — l)lnD, and using the union 

bound over all outer loop iterations, the probability that any of the chosen p's 
is not a good difference-prime is less than p/2. 

Consider next a single iteration of the outer loop. This will produce a valid 
output unless insufficiently many good p,'s and co/s are found for that choice 
of p. 

From Fact 3.1 and Lemma 3.6, the probability that p is an exception to the 
lemma is at most 5/ (3 In A), which is less than | from the bound A > 786. 

If p is not an exception, then Lemma 3.6 tells us that the probability of each 
q being prime is at least When q is prime, since prime p divides {q — 1), 
the probability that each is a p-PRU in Zq is (p — l)/p, easily making the 
total probability of successfully adding to O and W at each loop iteration at 
least 0.99/(In A). 

Let fl > 1.1 ln(2C) In^ A be the size of A. By Hoeffding's inequality ([Hoeffding, 
1963], Thm. 1), the probability that fewer than 0.03fl/ (In A) integers are added 
to Q after all iterations of the irmer loop is at most 

exp(-2fl(0.96/lnA)2) < exp(-2.021n(2C)) < 0.25, 

where the last inequality holds because C > 1. 

Therefore, with probability at least |, and using again A > 786, at least 

0.03fl/lnA = 0.0331n(2C)lnA > ln(2C)/lnA 

integers are added to Q each time through the inner loop. Since each p, > A, 
this means Hi ‘]i > 2C, and the algorithm will return on Step 13. 

Combining with the probability that p is an exception, we conclude that the 
probability the algorithm does not return in each iteration of the outer loop is at 
most 1/2. As this is repeated [Ig times, the probability is less than p/2 that 
the algorithm returns Fail. Using the union bound with the probability that 
any p is not a good difference-prime, we have the overall failure probability 
less than p. □ 

4 Multiplying via Interpolation 

Let F, G G Z[x] be sparse pol 5 momials with C = ||F||,^ -F ||G||,^ and D = 
degF -F deg G. The subroutine Sumset computes poss(F, G) by first reduc¬ 
ing the degrees and heights of the input pol 5 momials and then multiplying 
them. However, it cannot perform the multiplication using a recursive call to 
SparseMultZZ because the degrees are never reduced small enough to allow 
the use of dense arithmetic in a base case. 
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Procedure SparseInterpBB(F, G, a, r) 

Input: F,G e Z^[x]; a G Z^; r e Z>o. 

Output: H{az) mod (z’’ — 1), where H = FG. 

1 (F, G) ^ iF{oiz) rem [z’’ — 1), G(az) rem (z’' — 1)) 

2 H ^ F • G rem (z'^ — 1) via dense arithmetic 

3 return sparse representation of H 


Procedure BasecaseMultiply(F, G, S, }i) 

Input: F,G & Z[x]; S >#F + #G +#(FG); }i G (0,1). 

Output: H G Z[x] such that Pr[H 7 ^ FG] < fi. 

1 q ^GetPrime(2S \\F\\^ l|G||oo + 2degF + 2deg G, 

2 Call procedure MajorityVoteSparseInterpolate from [Arnold ef al., 2014], 
wifh coefficient field Z^, sparsify bound S, degree bound deg F + deg G, 
error bound and black box SparseInferpBB(F, G, -, •)• 


Insfead, we present here a "base case" algorithm which, given F, G, and a 
bound S > #F + #G + #(FG), computes FG, in time softly linear in S,logC, 
and polylog(D). Any algorithm with such running time suffices; we will use 
our own from [Arnold ef al., 2014], which is a Monfe Carlo sparse inferpolafion 
algorifhm for univariafe pol 5 momials over finife fields. 

To adapf [Arnold ef al., 2014] for mulfiplicafion over Z[x], we firsf choose 
a "large prime" q > max(2C, 2D) and freaf F, G and fheir producf H = FG as 
pol 5 momials over F^. This size of q ensures fhaf no exfension fields are neces¬ 
sary. The subroufine SparseInferpBB specifies how fhe unknown pol 5 momial 
H = FG G Fq [x] will be provided fo fhe algorifhm. If exacfly matches the sorts 
of black-box evaluations that [Arnold et al., 2014] requires. The entire proce¬ 
dure is stated as BasecaseMultiply. 

Lemma 4.1. The algorithm SparseInterpBB works correctly and uses 0(Slog Dlog^-|- 
rlogq) bit operations. 

Proof. Correctness is clear. To compute F(az), we replace every term cx^ of F 
and G wifh ca'^x'^ G Zg. This cosfs ©(Slog D log ij) by binary powering. Reduc¬ 
ing modulo (z’’ — 1) cosfs ©(S(logD -|- logq)) by Corollary 2.4. Dense arifh- 
mefic cosfs 0{rlogq) bif operafions by Facf 2.3. Summing fhese cosfs yields 
0{SlogDlogq + rlogq). □ 

Lemma 4.2. The algorithm BasecaseMultiply correctly returns the product H = FG 
with probability at least 1 — p, and has bit complexity (^S log^ D (log C -|- log D)^ . 

Proof. In order fo use SparseInferpBB in fhe algorifhm from [Arnold ef al., 2014], 
we simply replace fhe sfraighf-line program evaluafion on fhe firsf line of pro¬ 
cedure Compufelmage with our procedure SparseInterpBB. Again, note that 
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as the prime q was chosen with q > 2D, the MajorityVoteSparseInterpolate 
algorithm does not need to work over any extension fields. 

The correctness is guaranteed by the previous lemma, as well as Theorem 
1.1 in [Arnold et al., 2014]. For the bit complexity, in Section 7 of [Arnold ef al., 
2014], we see fhaf fhe cosf is dominafed by O^(logD) calls fo fhe black box 
evaluaf ion f uncf ion, each of which is supplied q G 0{C + D), and r eO^(SlogD). 
Applying fhe bif complexify of Lemma 4.1 gives fhe sfafed resulf. □ 


5 Sumset Algorithm 

Lef A,B e Z n ( — D, D) be nonempfy, R = #A + #B, and S = #(.4 © B) 
fhroughouf fhis section. We prove as follows: 

Theorem 5.1. Procedure Sumset{A, B, }i) has bit complexity 0^(Slog D) and pro¬ 
duces A(B B with probability at least 1 — p. 

We compufe fhe sumsef A(BB as supp(H), H = FG G Z[x“^,x], where 
F = and G = Here H has exponenfs in (—2D,2D) and 

ll^^llco < R. Thus if suffices fo consfrucf fhe exponenfs of H modulo £ > AD. 
Moreover, we have supp(H) = poss(F, G), and fhaf 

R-1 <#{A®B) = S <R^. (3) 

5.1 Estimating Sumset Output Size 

We firsf show how fo compufe a fighfer upper bound on fhe frue value of 
S = #H = #(A©B). To fhis end, lef p G 0(D) be a good difference-prime 
for supp(H), using fhe naive bound R^ from (3), and define fhe Hi = FiGi, 
where Fi, Gi G Z[x] are defined by 

Fi = F rem (x^ — 1), Gi = G rem (x^ — 1). 

Then deg Hi < 2p and each ferm cx® of H corresponds fo eifher one or two 
ferms in Hi, of degrees e rem p and (e rem p) + p. Therefore 

#Hi/2 <#H = S< #Hi. 

We will compufe an approximation S* ~ S such fhaf S*/2 < #Hi < S*, 
and fherefore S*/4 < S < S*. To fhis end we presenf a fesf fhaf, given S*, 
always accepfs if #Hi < S* and probably rejecfs if #Hi > 2S*. We do fhis for 
S* inifially 2, doubling whenever fhe fesf rejecfs. 

Given fhe currenf esfimafe S*, we nexf choose a (l/2)-difference-prime q 
for fhe supporf of any 2S*-sparse pol 5 momial wifh degree 2p > deg Hi, and 
compufe H* = Hi mod (x*? — 1). We work modulo m = R? > ||H||i > ||H*||oo, 
such fhaf none of fhe coefficienfs of H* vanish modulo m. If Hi is S*-sparse 
fhen H* is as well. If Hi has 2S* ferms fhen, as fewer fhan S* ferms of Hi are in 
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collisions, H* is not S*-sparse. As no terms of Hi vanish modulo m, additional 
terms in Hi can only increase #H*. 

We choose q so that the test is correct with probability at least 3/4. By 
iterating [8 ln(8/ ff) ] times, by Hoeffding's inequality, the probability is at least 
1 — }i/A that the test runs correctly in at least half of the iterations. As #Hi < 
2R^, it suffices that the test is correct [log 2 R + 1] times. 

The Sumset procedure performs this test on lines 3-7. By Corollary 2.4 the 
respective costs of constructing F* and F* mod (R^, x'? — 1) are O^(RlogD) 
and O^(RlogplogR), and similarly for G*. The cost of the dense arithmetic 
here is Oft (q log R). Given that p 6 Ofi(D),q E (S log p), the total bit-cost 
of this part is (S log D). 

5.2 Computing Sumset 

Armed with the bound S*/4 < S < S*, we aim to compute H = FG. Our 
approach is to compute images 

Hi = FiGi mod ~ 1)/ 

H2 = F{{£ -)- l)x)G{{£ + l)x) mod x^ — 1), 

for an integer £ = 8D > max(degH, HHHj). 

Since the coefficients of H 2 are scaled by powers of (/ -|-1), a single term 
cx‘^ in the original pol 5 momial H becomes cx^ P in Hi and c{£ -|- l)'^x '^P 
in H 2 , and if they are uncollided we can discover {£ + iy by computing their 
quotient. Modulo £^, this quotient (/ -|- ly is simply e£ + 1, from which we 
can obtain the exponent e. This idea is similar to the "coefficient ratios" tech¬ 
nique suggested by [van der Hoeven and Lecerf, 2014], but working modulo 
allows us to avoid costly discrete logarithms. Procedure Sumset contains 
the complete description. 

Sumset has four steps that are probabilistic: choosing a good difference- 
prime p, estimating the sumset size S = #{A(B B), and constructing Hi and 
H 2 . As each is set to fail with probability less than p/4, Sumset succeeds with 
probability at least 1 — p. 

We now analyze the total cost of this algorithm. GetDiffPrime produces p 
of size log p < polylog(R -|- log D + ji )- Constructing Fi, F2, Gi, G2 at the be- 
girming, and the reduction of Hi, H 2 modulo {£^, x^ — 1) at the end, both cost 
Oii{S log D) bit operations. 

The search for S* costs C2^(SlogD) from the previous section. Finally, as 
\\Fi 11^ < £12, degFi < p, and similarly for F 2 , Gi, and G 2 , the sparse multipli¬ 
cations due to BasecaseMultiply also costs Ofi{SlogD) bit operations. These 
dominate the complexity as stated in Theorem 5.1. 
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Procedure Sumset( A B, }i) 

Input: A,BCZ with #A + #B = R and \k\ < D; ^ (0/1)- 

Output: Set S C Z such that Pr[S 7 ^ .4 © < ji. 

1 p •(— GetDiffPrime(R^, 4D, 

3 S* ^2 

4 repeat [max( 81 n( 8 /p),log 2 R + 1 )] times 

5 q ^GetDiffPrime(2S*, 2p, j, |) 

6 H* ^FiGimod (R2 ,x‘?- 1) , via dense arithmetic 

7 if #H* > S* then S* ^ 2S* 


8 £<^8D 

9 (F2,G2)^ (E«6^K+l)x™P,EbeB(W + l)x^) 

10 Hi ^BasecaseMultiply(Fi, Gi, S*, p/ 4 ) 

11 H 2 ^BasecaseMultiply(F 2 , G 2 , S*, ji/4:) 

12 for; = 1,2 do Hj A- Hj mod {i^,xP — 1) 


13 S ^ empty list of integers 

14 for every nonzero term cx'^ of H-[ do 
c' ■(r- coefficient of degree-e term of H 2 
if c I c' and i \ {c'/c — 1) as integers then 

|_ Add (c'/c — V)!I to S 

else return Fail //cannot reconstruct an exponent 

19 return S 


6 Multiplication with support 

We turn now to the problem of multiplying sparse F, G 6 Z [x], provided some 
S 5 supp(FG). This algorithm is used twice in our overall multiplication al¬ 
gorithm: first with large S = poss(F, G) but small coefficients, then with the 
actual support S = supp(FG) but full-size coefficients. 

The bit-complexity of our algorithm is summarized in the following theo¬ 
rem. 

Theorem 6.1. Given F,G & Z[x] and asetSG Z>o such that supp(FG) C S, the 
product FG can be computed in time ((#F + #G + #S) (log C + log D)), where 
G = ||F||oo + l|G||oo ‘ind D > degF + degG. 

This is O^roptimal, as it matches the bit-size of the inputs. 

Our algorithm requires a small randomly-selected good difference-prime 
p with 0(logS + log log D) bits, and a series of pairs (q,co), where each q is 
a slightly larger prime with 0 (log p) bits, and co is an order-p element in Z^. 
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These numbers are provided by GetPrimRoots (Sec. 3). Our algorithm works 
by first reducing the exponents modulo p, then repeatedly reducing the coeffi¬ 
cients modulo q and performing evaluafion and inferpolafion af powers of oj. 
This irmer loop follows exacfly fhe algorifhm of [van der Hoeven and Lecerf, 
2013] and [Kalfofen and Yagafi, 1989] for applying a fransposed Vandermonde 
mafrix and ifs inverse. Since p is a good difference-prime for fhe supporf of fhe 
producf, fhere are no collisions and fhis gives us each coefficienf modulo q. The 
process is fhen repealed 0(log C) limes in order fo recover fhe full coefficienfs 
via Chinese remaindering. 

6.1 Comparison to prior work 

Wifhouf affecfing fhe complexify, we may assume fhaf S confains fhe supporf 
of fhe inpufs foo, i.e., supp(F) and supp(G). We also assume fhaf maxS = 
deg FG, such fhaf no e G S is foo large fo be an exponenf of FG. Under fhese 
assumptions, and writing S = #S, fhe sfafed complexify of our algorifhm is 
simply Oj,(S(logC -h log D)). 

The problem of computing fhe coefficienfs of a sparse producf, once fhe 
exponenfs of fhe producf are given, has been recenfly and exfensively invesfi- 
gafed by van der Hoeven and Lecerf, where fhey presenf an algorifhm whose 
bif complexify (in our nofafion) is 

Of^iiLees'^oge) (logD + logC)) 

([van der Hoeven and Lecerf, 2013], Corollary 5). As log e E 0{S log D), 
fhe algorifhm here saves a facfor of af mosf O(logD) in comparison, which 
could be subsfanfial if fhe exponenfs are very large. 

Their algorifhm is more efficienf if fhe supporf supersef S is fixed, in which 
case fhey can move fhe mosf expensive parfs info precompufafion and com- 
pufe fhe resulf in fhe same soff-oh time as our approach, (S(log D -)- log C)). 
Furfhermore, fhe supporf bif-lengfh Iliegs log e is af mosf 0(S log D), buf could 
be as small as n(S log S -|- log D), for example if fhe supporf confains only a sin¬ 
gle large exponenf. In such cases our savings is only on fhe order of (log D) / S. 

6.2 Transposed Vandermonde systems 

Applying transposed Vandermonde systems, and their inverses, is an impor¬ 
tant subroutine in sparse interpolation algorithms, and efficient algorithms are 
discussed in detail by [Kalfofen and Yagafi, 1989] and [van der Hoeven and Lecerf, 
2013]. We restate the general idea here and refer the reader to those papers for 
more details. 

If F is a dense pol 5 momial, it is well known that applying the Vander¬ 
monde matrix V(0i,..., Fp) to a vector of coefficients from F corresponds to 
evaluating F at the points 0i..., Applying the inverse Vandermonde ma¬ 
trix corresponds to interpolating F from its evaluations at those points. The 
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product tree method can perform both of fhese using 0{D) field operations 
([von zur Gafhen and Gerhard, 2013], Ghapfer 10). 

If F is insfead a sparse pol 5 momial F = evaluafing F af consec- 

ufive powers of a single high-order elemenf co corresponds fo mulfiplicafion 
wifh fhe fransposed Vandermonde mafrix: 

V{cv^^ .. csf = (F(l). F{co^-^)f 

The fransposifion principle fells us if is possible fo compufe fhe maps 
and (in essenfially fhe same time as dense evaluation and inferpolafion. 
In particular, if fhe coefficienfs c, are in fhe modular ring Zq, fhen fhe frans¬ 
posed Vandermonde map and ifs inverse can be compufed using 0{S lo^q) 
bif operafions [van der Hoeven and Lecerf, 2013]. 

6.3 Statement and analysis of the algorithm 


Procedure SparseMulGoeffs(S, F, G, }i) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 


Input: Exponents S = (ei, e2,...es)', coefficient lists (/i,... ,/s) and 
(gl,.. .,gs)/with F, G e Z[x] implicitly defined as 
F = Ei<i<sfi^‘^‘ and G = X)i<;<s error bound ji e (0,1). 
Output: {hi,.. .,hs) E Z^ such that, with probability least 1 — }i, 

FG = Li<i<shiX''‘- 
C E- (maxi<;<s |/•|)(maxl<,■<s |g,|)S 
P, Q, •<— GetPrimRoots(max S, #S, C, p) 

Sp E- {ei fern p, 62 rem p,. .., £3 p) 

H E- list of S empfy lisfs of infegers 
foreach {q,co) G Q, Wdo 
foreach e,p E Sp do 
1 ^ Vj E- LO^'v E Zq by binary powering 

V(i;i. vsYih . /s)^eZS 

b ^ V{vi,...,vsY{gi,...,gsY e ZS 

c ^ {aibi,...,asbsY E Z^ 
if V(vi ,.. .,vs) is invertible then 

{hip, ... ,hsp) E- (V(i;i,...,i;s)^)"^c e Z® 
for 1 < f < S do Add h^p fo fhe lisf H[t\ 


14 for 1 < f < S do 

15 L hj E- Ghinese remaindering from images H[i] modulo infegers in Q 

16 return {hi,.. .,hs) 
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Lemma 6.2. Procedure SparseMulCoeffs works as stated when S 2 supp(FG). In 
any case it has bit-complexity 



Proof. We first analyze the probability of failure when S 3 supp(FG). The 
randomization is in the choices of p, q, and co; problems can occur if these 
lack the required properties. 

If p is a good difference-prime for supp(FG), then by definition there will 
be no collisions in Sp. Furthermore, if a; is a pth root of unity modulo q, then 
there are no collisions among the values (uj,..cg), so V{vi,..., 0 $) is in¬ 
vertible modulo q. Algorithm GetPrimRoots ensures this is the case with high 
probability, and if so the algorithm here faithfully computes each coefficient hi 
modulo q. 

Conversely, if there are no collisions in Sp, and if no zero divisors modulo q 
are encountered in the application of the Vandermonde matrix and its inverse, 
then the algorithm correctly computes then values /z, mod q, even if p is not 
prime or some w G is not actually a pth root of unity. 

Therefore all failures in choosing tuples p, q, oj are either detected by the al¬ 
gorithm or do not affect its correctness. Since that is the only randomized step, 
we conclude that the entire algorithm is correct whenever the input exponent 
set S contains the support of the product. 

For the complexity analysis first define D = deg(FG). Step 2 costs Op (log C ■ 
polylog(S -|- log D)) bit operations by Lemma 3.7. Reducing each exponent e; 
modulo p, on step 3, can be done for a total of Op (I^egs log e) bit operations. 

Now we examine the cost of the for loop that begins on step 5. As the 
exponents are now all less than p, computing each Vj on step 7 requires only 
O(logp) operations modulo q, for a total of 0(Slog plogt?), which is Op(S • 
polylog(log C -|- log D)) bit operations. From before we know that applying the 
transposed Vandermonde matrix and its inverse takes Op(Slogzj), or Op(S • 
polylog(log C -)- log D)) bit operations. 

Because #Q < [logp(2C)], the loop on step 5 repeats 0(log C) times, for a 

total cost of 0/t(S log C ■ polylog(log D)) bit operations. This also bounds the 
cost of the Chinese remaindering in the final loop. □ 

7 Multiplication algorithms 

The complete multiplication algorithm over Z [x] that was outlined in the in¬ 
troduction is presented as SparseMultZZ. 

Proof of Thm. 1.1. Unless failure occurs, we have Si = poss(F, G). Every coef¬ 
ficient in H is a sum of products of coefficients in F and G, so the value Cu 
computed on step 2 is an upper bound on ||H||pQ, and p is a good vanish-prime 
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Procedure SparseMultZZ(F, G) 

Input: Sparse F, G e Z[x]; e (0,1). 

Output: Sparse H £ Z[x], such that Fr[H ^ FG] < }i. 

1 Si ^Sumset(supp(F),supp(G), 

2 Ch ^ ||F||^ ||G||^max(#F,#G) 

3 p •«— GetVanishPrime(#Si, C, 1, |) 

4 Hi •<— SparseMulCoeffs(F rem p, G rem p. Si, 

5 S 2 ^ supp(Hi rem p) 

6 return SparseMulCoeffs(F, G,S2 ,f),S2 


for H. Thus S 2 = supp(H rem p) = supp(H), so the final sfep correcfly com- 
pufes fhe producf FG. 

By fhe union bound. Lemma 3.3 and Theorems 5.1 and 6.1, fhe probabilify 
of failure is less fhan p. 

Writing T = # poss(F, G) and S = #supp(FG),weseefhaf log p < polylog(r + 
log C + i), fhus sfeps 1 and 4 confribufe Op ( T log D) fo fhe overall cosf, whereas 
fhe lasf sfep cosfs 0^(5(log D + log C)) bif operations. As S < T, fhe fofal bif 
complexify is (T log D + S log C), as required. □ 

We now consider exfensions of fhis algorifhm fo posifive and negative ex- 
ponenfs (Laurenf pol 5 momials), multiple variables, and ofher common coeffi- 
denf rings, using Kronecker subsfifufion. This is sfafed in fhe following fheo- 
rem. 

Theorem 7.1. Let F,G £ be sparse Laurent polynomials over R, 

where R = Z, Zm, GF(q®), or Q. The product FG can be computed using 

Op{T{nlogD + B)) 

bit operations, where T = # poss(F, G) is the structural sparsity of the product, D > 
max,' I deg, (FG)|, and B is the largest bit-length of any coefficient in the input or 
output. 

Proof. Wrife fhe oufpuf pol 5 momial H = FG as 

LJ _ wT y.^in 

11 — L-ii—1 -^2 ' 

where each c,- £ R and each e,'y satisfies |e,'y| < D. 

We firsf apply fhe Kronecker subsfifufion, providing an easily-inverfible 
map between R[x^^,..., and R[z^^]: x,- 1 —)• z^' ^ for I < i < n. This 

increases fhe degree fo D", such fhaf fhe logarifhm of fhis degree 0(n log D), 
mafching fhe exponenf bif-size in fhe mulfivariafe represenfafion. 

The algorifhm Sumsef already handles negative exponenfs (i.e., Laurenf 
pol 5 momials) explicifly. The ofher primary subroutine fo procedure SparseMulfZZ 
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is SparseMulCoeffs, which only uses the exponents in the set Sp, which are re¬ 
duced modulo p and therefore cause no difficulty if fhey are negative. Thus 
the multiplication algorithms handle univariate Laurent pol 5 momials without 
any changes. 

To extend the multiplication algorithm and it subroutines beyond R = Z, 
we use that our algorithm is also softly-linear in the input heights. This allows 
us to adapt to many coefficient domain that provides a natural mapping to 
the integers, and to preserve softly-linear time if fhaf mapping provides only a 
soffly-linear increase in size. 

For a modular ring R = Zm, we can trivially treat the inputs as actual inte¬ 
gers, then reduce modulo ni after multiplying. For a finite field R = GF(p'^), el- 
emenfs are f 5 rpically represenfed as pol 5 momials over Zp [z] modulo a degree-d 
irreducible pol 5 momial, so fhese coefficienfs can be converfed fo infegers using 
a low-degree Kronecker subsfifufion. For fhe rafionals R = Q, we mighf choose 
a prime q larger fhan fhe producf of fhe largesf numerafor and denominator in 
fhe outpuf, multiply modulo q, fhen use rational reconsfrucfion fo recover fhe 
acfual coefficienfs. 

In all fhe above cases, fhere is growfh in fhe bif-lengfh of coefficienfs, buf 
only in poly-logarifhmic terms of inpuf and outpuf size, fherefore nof affecfing 
fhe soff-oh complexify. The only downside is fhaf we are no longer able fo 
splif fhe cosf neafly befween T = poss(F, G) and S = supp(FG) because fhe 
unreduced integer pol 5 momial producf mighf have nonzero coefficienfs which 
are really zeros in R. □ 
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